AITopics | data replication

Collaborating Authors

data replication

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Finding Dori: Memorization in Text-to-Image Diffusion Models Is Not Local

Kowalczuk, Antoni, Hintersdorf, Dominik, Struppek, Lukas, Kersting, Kristian, Dziedzic, Adam, Boenisch, Franziska

arXiv.org Artificial IntelligenceOct-15-2025

Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering verbatim training data replication, based on the assumption that memorization can be localized. We challenge this assumption and demonstrate that, even after such pruning, small perturbations to the text embeddings of previously mitigated prompts can re-trigger data replication, revealing the fragility of such defenses. Our further analysis then provides multiple indications that memorization is indeed not inherently local: (1) replication triggers for memorized images are distributed throughout text embedding space; (2) embeddings yielding the same replicated image produce divergent model activations; and (3) different pruning methods identify inconsistent sets of memorization-related weights for the same image. Finally, we show that bypassing the locality assumption enables more robust mitigation through adversarial fine-tuning. These findings provide new insights into the nature of memorization in text-to-image DMs and inform the development of more reliable mitigations against DM memorization.

artificial intelligence, machine learning, memorization, (17 more...)

arXiv.org Artificial Intelligence

2507.1688

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

95dcc1f6463491d37a8918c1d38380a7-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 10:20:51 GMT

dataset, diffusion model, dms, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
Asia > Singapore (0.04)
Asia > Nepal (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio

Batlle-Roca, Roser, Liao, Wei-Hisang, Serra, Xavier, Mitsufuji, Yuki, Gómez, Emilia

arXiv.org Artificial IntelligenceAug-1-2024

Recent advancements in music generation are raising multiple concerns about the implications of AI in creative music processes, current business models and impacts related to intellectual property management. A relevant discussion and related technical challenge is the potential replication and plagiarism of the training set in AI-generated music, which could lead to misuse of data and intellectual property rights violations. To tackle this issue, we present the Music Replication Assessment (MiRA) tool: a model-independent open evaluation method based on diverse audio music similarity metrics to assess data replication. We evaluate the ability of five metrics to identify exact replication by conducting a controlled replication experiment in different music genres using synthetic samples. Our results show that the proposed methodology can estimate exact data replication with a proportion higher than 10%. By introducing the MiRA tool, we intend to encourage the open evaluation of music-generative models by researchers, developers, and users concerning data replication, highlighting the importance of the ethical, social, legal, and economic consequences. Code and examples are available for reproducibility purposes.

data replication, music, replication, (15 more...)

arXiv.org Artificial Intelligence

2407.14364

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain (0.04)
Europe > Netherlands > Utrecht (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models

Pan, Zhuoshi, Yao, Yuguang, Liu, Gaowen, Shen, Bingquan, Zhao, H. Vicky, Kompella, Ramana Rao, Liu, Sijia

arXiv.org Artificial IntelligenceNov-4-2023

While state-of-the-art diffusion models (DMs) excel in image generation, concerns regarding their security persist. Earlier research highlighted DMs' vulnerability to backdoor attacks, but these studies placed stricter requirements than conventional methods like 'BadNets' in image classification. This is because the former necessitates modifications to the diffusion sampling and training procedures. Unlike the prior work, we investigate whether generating backdoor attacks in DMs can be as simple as BadNets, i.e., by only contaminating the training dataset without tampering the original diffusion process. In this more realistic backdoor setting, we uncover bilateral backdoor effects that not only serve an adversarial purpose (compromising the functionality of DMs) but also offer a defensive advantage (which can be leveraged for backdoor defense). Specifically, we find that a BadNets-like backdoor attack remains effective in DMs for producing incorrect images (misaligned with the intended text conditions), and thereby yielding incorrect predictions when DMs are used as classifiers. Meanwhile, backdoored DMs exhibit an increased ratio of backdoor triggers, a phenomenon we refer to as `trigger amplification', among the generated images. We show that this latter insight can be used to enhance the detection of backdoor-poisoned training data. Even under a low backdoor poisoning ratio, studying the backdoor effects of DMs is also valuable for designing anti-backdoor image classifiers. Last but not least, we establish a meaningful linkage between backdoor attacks and the phenomenon of data replications by exploring DMs' inherent data memorization tendencies. The codes of our work are available at https://github.com/OPTML-Group/BiBadDiff.

backdoor attack, backdoor trigger, dms, (12 more...)

arXiv.org Artificial Intelligence

2311.02373

Country:

North America > United States > Michigan (0.04)
Asia > Singapore (0.04)
Asia > Nepal (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Testing for the Markov Property in Time Series via Deep Conditional Generative Learning

Zhou, Yunzhe, Shi, Chengchun, Li, Lexin, Yao, Qiwei

arXiv.org Artificial IntelligenceMay-30-2023

The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the order of the Markov model. We show that the test controls the type-I error asymptotically, and has the power approaching one. Our proposal makes novel contributions in several ways. We utilize and extend state-of-the-art deep generative learning to estimate the conditional density functions, and establish a sharp upper bound on the approximation error of the estimators. We derive a doubly robust test statistic, which employs a nonparametric estimation but achieves a parametric convergence rate. We further adopt sample splitting and cross-fitting to minimize the conditions required to ensure the consistency of the test. We demonstrate the efficacy of the test through both simulations and the three data applications.

artificial intelligence, log 1 2, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.19244

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

What is the Data Architecture we Need?

#artificialintelligenceFeb-10-2020, 03:35:33 GMT

In the new era of Big Data and Data Sciences, it is vitally important for an enterprise to have a centralized data architecture aligned with business processes, which scales with business growth and evolves with technological advancements. A successful data architecture provides clarity about every aspect of the data, which enables data scientists to work with trustable data efficiently and to solve complex business problems. It also prepares an organization to quickly take advantage of new business opportunities by leveraging emerging technologies and improves operational efficiency by managing complex data and information delivery throughout the enterprise. When compared with information architecture, system architecture, and software architecture, data architecture is relatively new. The role of Data Architects has also been nebulous and has fallen on the shoulders of senior business analysts, ETL developers, and data scientists.

architecture, business process, data architecture, (11 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.30)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.36)
Information Technology > Data Science > Data Integration (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.35)

Add feedback

How to transform your SAP S/4HANA system in a machine learning (ML) power house by exposing your tailor-made ML models

#artificialintelligenceJun-25-2018, 12:51:37 GMT

In this blog, explain how you can use your SAP S/4HANA system to execute machine learning tasks. We sketch a step-by-step guide on how to process the data, execute the relevant database procedure and discuss potential ways to expose the results. The advantages of this approach are that no data replication is needed and the use of database procedures offer great performance. Note that this approach requires custom development on the S/4HANA system. You may wonder why you want to do machine learning on an S/4HANA System?

artificial intelligence, machine learning, procedure, (18 more...)

#artificialintelligence

Country: Europe > Germany (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Machine learning is driving demand for data replication

#artificialintelligenceMar-11-2017, 09:00:40 GMT

Data for the enterprise is now a currency of its own, yet many companies and institutions are still trying to navigate the moving of large volumes of data from on-premise to the cloud in an effort to capitalize on the value of data stored in many locations. "I think longer-term the economic advantage of using cloud environments are undeniable. The cost advantages of hosting information in the cloud, the benefits that come from the scalability of those environments is far surpassing capabilities that organizations can invest in themselves or their own data centers," said Paul Scott-Murphy, vice president of product management, big data/cloud, at WANdisco Inc. During the Google Cloud Next event, Scott-Murphy spoke with Stu Miniman (@stu), host of theCUBE, SiliconANGLE Media's mobile live streaming studio, at SiliconANGLE's Palo Alto, CA, studio to discuss the trends WANdisco is seeing with its customers, as well as news from Google Cloud Next. WANdisco's enterprise and institutional customers are all facing similar problem: The availability of data and the combination of where it is stored makes it difficult to access and derive any benefits for them.

cloud computing, data mining, machine learning, (13 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.26)

Industry: Information Technology > Services (0.97)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.44)
Information Technology > Data Science > Data Mining > Big Data (0.37)

Add feedback